sdunsvery Jabra by, 


ADAPTATION OF ARABIC SCRIPT 


The major scripts used for national languages can be divided 
into seven major script families, and each of these can be 
traced back to a particular language. Each script can be 
associated with a particular great cultural tradition, which 
as it spread, spread not only its script but also other 
cultural trappings, notably religion. The spread of Indian 
culture and Buddhist religion and of Arabic culture and 
Islamic religion was quite naturally associated with a 
spread of Devanagri and Arabic scripts respectively, just as 
today the ascendency of certain cultures and ideas has gone 
hand in hand with the spread of Roman and Cyrrilic based 
scripts. 


. Population Language h 


z. (millions) Basis Religious Backgrounä 
. 1893 39% Latin Western Christianity 
+ 1212 25% Chinese Confuscian, etc 
362 20% Sanscrit Buddhist 
432 ay Arabic Muslim 
307 6% Greek Orthodox, Scientific Atheism 
44 Ethiopian Orthodox $ 
4 Judaism : 


š World Population by Script * 
í showing language basis of script and 
é religion associated with its cultural tradition 


Because of its close relationship with cultural tradition, 
choice of orthography and script is a very emotional issue. 
This is particularly noticeable today in the association of 
Islam and Arabic based scripts. With a long history of use 
and the high value placed in many of these cultures on the 
art of calligraphy, it is no wonder that the trend in this 
century toward Roman and Cyrrilic orthographies is viewed as 
a cultural affront. 


We will discuss some of the special characteristics of the 
Arabic script itself, then look at some of the ways in which 
it has been modified for other languages, and finally 
discuss some principles helpful to those who may wish to 
further adapt Arabic script. I should say that my own 
experience is weak as regards Arabic itself and the 
modifications of the script in Africa and Southeast Asia. 
If you have more experience than I, I would appreciate your 
input either by writing comments or by completing the 
questionaire included in the appendix. 


THE WRITTEN ARABIC LANGUAGE 


The Arabic alphabet has 28 consonants written from right to 


m 


Adaptation of Arabic Script Oct 87 


The Arabic alphabet 


left and generally connected in a cursive fashion. Half of 
these have dots placed above or below them which distinguish 
them from other consonants with different combinations of 
dots or with no dots at all. Vowels are indicated by 
diacritics and by certain ambiguous consonant symbols. 
Several additional marks are used to indicate absence of 
vowel, geminate consonants or other less frequent phenomena. 


Cursive. The letters of a word are generally connected 
together, like English cursive handwriting. Many of the 
letters have a final flourish which is deleted to simplify 


joining to the following letter. Joining the preceding 
letter requires only small changes, usually just the 
addition of a small connecting line. Thus, letters 


generally have four forms: mbal mebhal Grol ed solakka 


Some letters, called "non-joiners", break the cursive 
pattern; they can be connected only on the right to the 
preceding letter. Thus, a non-joiner has only two forms: 
final and isolate. Non-joiners will break a word into two 
er more connected parts called "ligatures", with the 
non-joiner itself at the end (left) of the ligature. The 
letter following a non-joiner is in initial (or isolate) 
form, because although it is not word initial, it is 
ligature initial. 


Dots. Many letters share the same basic shape but differ 


Adaptation of Arabic Script Oct 87 2 


ISOLATE FINAL MEDIAL INITIAL Ft™Mde+tI 
not connected connected connected connected 
connected on left both ways on right together 


IM Ee AR 
LAM o e E AT) cu 
Mim a [OO nee m 


Examples of Arabic cursive shapes 


only by combinations of dots placed above or below the 
letter. People generally write the base forms of all the 
letters of a word, and then return to place the appropriate 
number of dots above and below. But these dots are not 
diacritics; they form an integral part of the letter, much 
like the dot on the English i, which in certain cursive 
styles, is the only thing to distinguish it from an e. 


This "dotting" feature has given Arabic script a great deal 
of flexibility which other languages have used to great 
advantage. 


The letters in a "shape group", that is, with the same basic 
shape, have the same four caligraphic forms (initial, 
medial, final and isolate) except for the number and 
Placement of dots. 


There are a few oddities, however. FE and QAF have distinct 
isolate and final shapes, having a shallow and deep curve 
respectively. (In North Africa, these letters are written 
differently). But the initial and medial forms are 
distinguished from each other only by dots. Similarly, NUN 
and YE each has its own unique isolate and final shape, but 
the initial and medial forms are distinguished from the BE 
shape group only by dots. 


I have not seen any modifications of the script which add 
any more such oddities to the script. 


Vowels. Three characters are used to indicate the long 
vowels. Two of these (WAU and YE) can ambiguously represent 
the semivowels; the third (ALEPE) is associated with the 
glottal stop consonant. j; 


Adaptation of Arabic Script Oct 87 3 


ISOLATE FINAL MEDIAL INITIAL Bo eMC ST: 


BE es Kate Eee ae hel tu 


- e 
n hee ate! 


Is 


O 
C5 


FE AEA 


| 


QAE cy TER es Oey ae 


Letters which break the shape group patterns 


Ci vote ct C $s 


BDiacritics above or below a consonant indicate when it is 
followed by a short vowel or by no vowel. Aithough the long 
vowel letters are almost always written, these diacritics 
are used mainly in the Koran, poetry or beginners' texts, to 
insure proper pronounciation. 


This vowel system has proved a challenge for those adapting 
the Arabic script to other languages. 


HISTORY OF ARABIC SCRIPTS AND CALLIGRAPHY 


Of the script families in use today, five of them -- 
Amharic, Arabic, Greek/Cyrrilic, Hebrew and Latin -- are 
distant cousins, tracing their ancestry back to a script in 
use about 1000 BC in Phonecia. This non-cursive script had 
22 consonants and no way of writing vowels. 


By the 8th century BC, certain letters had become ambiguous 
“sometimes vowels", like our English w and y. A century 
later the script was borrowed by the ancestor of Amharic, 
and also by the ancestor of the Greek alphabet. The Greeks 
dropped the consonantal value of the ambiguous letters, but 
the ambiguity was reintroduced in Latin and continued in 
English (u/v and i/j) until recent centuries. 


The ancestor script of modern Hebrew broke off some time 
after Greek and Amharic; this may explain that these are 
both written from right to left. 


The "Arabic" branch of the family did not make the Greek 
vowel innovation, but retained the original ambiguous 
notation. By about the third century AD, varieties of this 
alphabet were used for Syriac, Palmyrene and Nabataean. All 


Adaptation of Arabic Script Oct 87 4 


three developed a system of.cursive writing whereby many of 
the letters within a word were connected together. 


With the cursive system came the differentiation of the 
forms depending on position in the word. Further, certain 
letters lost their distinctive shape. Syriac early 
developed the technique of disambiguating such similar 
letters by placing dots over or under them, but Nabatean, 
the ancestor of Arabic, did not attempt to use this device. 


Jazm, the first script of the Quran, had developed from 
Nabatean by the 7th century. This jerky, angular script 
still did not consistantly use the dots necessary to 


differentiate all of the consonants. In fact, this 
convention remained somewhat sporadic even up ta medieval 
times! Long vowels were indicated, after the Nabatean 


system, but not consistantly. The word Allah was pronounced 
with a long /a/ in the second syllable, but spelt without an 
ALEPH. Modern spelling also omits the ALEPH, placing 
instead a small raised ALEPH above the LAM. 


The early Jazm script was soon superceded for Quranic 
copying by Kufic, another angular script. Varieties of this 
script are still widely used in title pages, carvings anc 
other decorative work, and in the Maghribi (Western) script 
of Morocco and Algeria. 


The need to correctly read the Quran encouraged further 
developments. Not only were consonantal dots and long 
vowels important, but by the 8th century a standard system 
of marking short vowels was adopted, at least for the Quran. 
But the Eastern pronounciation was imposed on the received 
“Meccan consonantal orthography, creating anomolies visible 
today, as the spelling of ‘Isa with a final YE plus small 
raised ALEPH. 


Naskh caligraphy, with its smooth curves, evolved in the 
ninth century, and developed into the "flat” styles popular 
in the subcontinent and other eastern parts of the Islamic 
world. This and other styles of caligraphy were 
systematized in the tenth century by ibn Mugqiah based on 
three units: the dot; the ALEPH, whose height in dots 
characterized each style; and the circle, whose diameter was 
equal to an ALEPH. 


Other calligraphic styles have developed over the centuries. 
Taliq, a "hanging" or "sloping" style, dates from the 
fifteenth century in Persia. Persian calligraphy has 
generally followed this pattern, but the introduction of 
moveable type brought the acceptance of a more flat style. 
Nastaliq (Naskh-Taliq) style is a combination of Naskh and 
Taligq, and is the only style fully acceptable for Urdu. 


Mechanization is much .more difficult than with the flat 
styles, so almost all Urdu printing is photo reproduced from 


Adaptation of Arabic Script Oct 87 5 


the work of a calligrapher. Typewriters and most computers 
use a flat style, though one newspaper has an acceptable 
computerized Nastaliq font and a hybrid style has been 
developed using the ED and MS programs developed by JAARS, 
Inc. 


Ruga (or Riqa ?) style dates to the same era as Talig, and 
became widely accepted during the ninteenth century, 
developing in to the handwriting style used for Arabic 
today. 


Today one can see a great variety in the use of different 
script styles in the Arabic Script world. Distinctive 
styles are used by different languages and in different 
regions, even in the same text for emphasis as we might use 
italic and roman together. But much more has been done to 
adapt the script to languages which are sometimes quite 
different than the original Arabic. 


Adaptation of Arabic Script Oct 87 6 


ADAPTING ARABIC SCRIPT TO OTHER LANGUAGES 


Arabic script has been adapted to many kinds of languages: 
Semetic, Indo-European, Turkic, Slavic, Malayo Polynesian 
and African, including Persian, Urdu, Pashtu, Sindhi, Malay, 
Turkish, Swahili, Spanish, Hebrew, Berber, Sudanese. This 
has required many kinds of adaptations. 


Arabic 
Swahili ed Persian ar Malay 
Philipine 
Languages 
Kurdish Pashtu urdu sindhi 
Derivation of Arabic Based Scripts 
Some Arabic phonemes are not found in other languages. A 


few languages delete the corresponding "Arabic" letters from 
their alphabet, but most retain them, keeping the alphabet 
of the Quran a subset of each Arabic-based alphabet. The 
"non-native" letters are assigned a phonetically close 
"native" sound, resulting in over differentiation and 
spelling problems. 


Notice that this leads to multiple pronounciations of some 
Arabic words in English. The month of fasting is generally 
called "Ramadan" when it is borrowed into English directly 
from Arabic, but in the Subcontinent (and Iran?) the 
accepted Anglicization is "Ramazan". 


Outside the Arabic speaking world, those who know Arabic 
will sometimes try to differentiate between the different 
Arabic sounds, (with varying degrees of success), but these 
letters are generally homophonous. 


In each of these sound groups, one letter is consistantly 
used for native words, leaving the other "Arabic" consonants 
exclusively for Arabic loan words, which generally retain 
the original spelling. Old Persian, however, shared a /dh/ 
fricative with Arabic, and DHAL (ZAL) was used to indicte 
this sound in native words. This spelling continues today, 
though the phontetic distinction no longer exists in modern 
Persian. 


Most languages retain the original spelling of words 
borrowed from Arabic, occasionally replacing "Arabic" 
consonants with "native" ones. Sometimes there is a desire 
to exclude all "non-native" consonants, and this is a point 
of orthographic controversy in some languages. In languages 


Adaptation of Arabic Script Oct 87 Z 


+ 
iO 
Leal 


7t/ TE és 


/s/ SIN ( SE w SWAD (jy? 
ue zor b 


/2/ ZE J ZAL Ss) ZWAD 
/y/ A QAF E) (Iranian Persian) 
/K/ KAF + QAF oO (Most others) 


Homophonous letters 
in Persian and daughter alphabets 


retaining "Arabic" consonants, they are generally introduced 
in primers in later lessons. Since Arabic loanwords tend to 
be higher level vocabulary, finding appropriate words for 
picture A-B-C books is often problematic. 


These "Arabic" letters are not generally reassigned to 
represent non-Arabic sounds, instead new letters are 
invented. This leads to large alphabets -- though Arabic 
has only 28, Malay has 31, Persian 32, Urdu 37 (or 47 
depending on how you count), Pashtu 42 and Sindhi 52! 


New consonants are easily created by adding dots or other 
marks to existing letters. The base form is usually 
selected because of a sound similar to the "new" one, and a 
strong tendency can be observed to perserve or even create 
sound-symbol relationships. 


Thus, Persian requires a /p/ and /ch/ sounds which Arabic 
doesn't have, and it uses the base form of BE and JIM to 
create new letters, replacing the single dot below with 
three dots. So, "three dots below", a combination not 
present in Arabic, seems to mean "devoicing". 


Likewise, a /zh/ sound is not present in Arabic, but is 
added in Persian by the analogy s:sh :: z:zh. Thus, since 
SHIN adds 3 dots on the SIN shape, ZHE is created by placing 
3 dots above the ZE shape (or more accurately, above the RE 
shape, which is the base form of ZE). Pashtu continues this 
analogy by representing the retroflexed fricatives with the 
same base shapes with two dots, one above and one below. 


Arabic has five dot combinations: one dot, above or below 
the base line character; two horizontal dots, above or 
below; and three dots above, arranged in an upward pointing 


triangle. Mentioned above are two additional combinations: 


Adaptation of Arabic Script Oct 87 8 


Voiced Stops: /az/ JIM /b/ BE eS 
{Ar,Pe,Pa)(° (Ar, Pe, Pa) w 
Unvoiced Stops: /tf/ CHE /p/ PE CE3 
(Pe, Pa) (Pe,Pa) = 
Fricatives: {z/n ZE s /s/ SIN 
(Ar, Pe, Pa) 7 (Ar, Pe, Pa) Ww 
Palatalized: /3/ ZEE /Í/ SHIN or 
(Pe, Pa) (Ar,Pe,Pa) 
Retroflexed: /z,Q/ ZHE - /8,8/ SHIN, pb 
(Pa) 4 (Pa) ee 
Consonant Innovations from Arabic (Ar) 
in Persian (Pe) and Pashtu (Pa) 
Arabic basis ae : “= 5 


Persian additions — 


Pashtu additions — — Er pe 

Urdu additions or zai 

Sindhi additions = 7- ç > > 
Other languages — —> J sa ER 


Dot combinations and other symbols 
used on consonants in various languages 


three dots below, arranged in a downward pointing triangle, 
and two dots, one above and one below, used in Persian and 
Pashtu respectively. Pashtu also adds two vertical dots 
below, distinguished from the normal horizontal placement. 


Sindhi distinguishes horizontal and vertical two dots both 
above and below the base character. Above the line, Sindhi ' 
distinguishes two kinds of three dot combinations, one 
upward pointing and the other downward pointing, though 
below the line it only uses three downward pointing dots. 
With the addition of four dots above and four dots below, 
Sindhi has the greatest consonantal variety of any of the 
Arabic based scripts I have seen. 


Generally, these dots are added to one of the base shapes 
which already had dots, but Sindhi adds dots to the GAF 
shape, a derivitive of the KAF shape, which did not use dots 
in Arabic. 


The number 4 has been used in some minority languages to 
represent four dots. Caligraphicaliy, two dots can be 
replaced by a single bar, three dots by a circle, and four 
dots by parallel lines. 


Other marks besides dots are used to distinguish letters. 
Again, the base shape is usually selected to be phonetically 
similar to the "new" letter. 


Urdu originally used 4 dots above to indicate retroflexed 
consonants, but these came to be replaced by a smail raised 
TOI letter. Pashtu writes these same sounds attaching an 
ear-shaped hoop called gharwandde to the base of the letter. 
Chitrali uses the TOI in conjunction with dots to indicate 
retroflexed fricatives and affricates. 


Vowel diacritics are sometimes incorporated with consonantal 
shapes to create new letters. The sukkun or no-vowel mark 
is sometimes written as a V or as an inverted V. Kurdish 
uses the former above a LAM for a dark /1/; Kashmiri uses 
the latter above NUN and RE and Parkari above DAL and RE. 
Shina uses the circular sukkun below an undotted JIM and 
above an undatted NUN, and two such marks below an undotted 
JIM and above a SIN. The Pashtu /dz/ affricate is written 
with an undotted JIM shape topped with a hamza, which isn't 
actually a vowel mark in Arabic, but is used elsewhere to 
indicate vowel glides or a glottal stop. 


Similarly, the Persian writes /g/ as a double barred KAF 
shape, which looks quite like KAF with a fatha (zabar) above 
it. This requires a further change: writing the bar(s) on 
both word finally, a convention not generally followed in 
Arabic. GAF was, however, a relatively late innovation; 
late nineteenth and early twentieth century Persian books 
don't distinguish KAF and GAF. This apparentiy explains why 
Pashtu, which follows every other Persian innovation, 
indicates /g/ differently, as a KAF with a gharwandde (ear) 
hanging from the bar or base. 


The base form may be changed by a new script. The "changed" 
form may actually be an innovation, or it may be an adoption 
of one particular style over another. In Iran and South 
Asia, the isolate and final YE frequently has no dots, 
whereas in Arabic it usually does. Similarly, Sindhi uses 
the short tailed MIM exclusively, and chooses a particular 
form of the JIM shape group (m) to the exclusion of others 
(g or 2). 


Sometimes two alternate forms are both adapted into a 
"daughter" script, but with different meanings. The 
horizontal and vertical two dot combinations could be 
regarded in this way. Another example is Urdu, where the 


"hook" form of HE indicates an independent consonant, 
whereas the "butterfly" form indicates aspiration of the 


previous consonant. Similarly, Sindhi distinguishes between 


Adaptation of Arabic Script Oct 87 10 


Dental Stops: TE | DAL > RE J NUN (a) 


(Ar,Ur,Pa) 7t/ 7a/ /e/ Tal 
Retroflexed: ITE Ż, DDAL = RRE 
(Urdu) 7t/ 7a/ Iri 2 
Retroflexed: TTE 0, DDAL, RRE NNUN 
(Pashtu) /t/ ae /a/ 2 IRT 4 Jai a F 


Retroflexed Consonants in Urdu and Pashtu 
the upright and folded KAF shape (J vs. S ), indicating 
presence and absence of aspiration respectively. 


Digraphs have also increased the repetoire of symbols in 
Arabic. Urdu puts the "butterfly" form of the HE after 
eleven different stops to indicate aspiration of that stop. 
Pakistani Pashtu indicates retroflexed /n/ as a dental NUN 
plus retrofiexed RRE, which in turn is a regular RE with the 
gharwandde (ear) on it. (Afghan Pashtu uses a separate 
grapheme, dental NUN with the gharwandde). Kurdish has a 
trilled /¥/ word initially and a flap vs. trill contrast 
elsewhere; like Spanish it uses a single RE word initially 
and for the flap, and a double RE for non-initial trilled 
/f/. Single and double WAU provide contrast between two 
high back vowels. 


The Arabic vowel system is quite simple: three short vowels, 
indicated by diacritics above or below the preceding 
consonant, and three long vowels, indicated by full letters. 


Arabic has three short vowel diacritics: fatha, a "northeast 
to southwest" accent above the letter; kasra, similarly 
shaped but below the letter; and damma, a small raised coma 
or WAU letter shape above the letter. In Persian and South 


Asian nomenclature, these are zabar, zer and pesh 
respectively. They are placed on the consonant which they 
follow, and word initially on an aleph. A sukkun, shaped 


like a small circle, half circle, V or upside down V, 
indicates absence of vowel. 


Distinctive varieties of the short vowel marks are used to 
indicate certain Arabic grammatical endings, which are 
realized as vowel plus /n/ in connected speech. These 
symbols are formed by doubling the vowel diacritics. 


Long vowels are indicated by baseline characters, which 
technically should be considered prolongation marks rather 
than actual vowel letters. The long form of the fatha, 
kasra and damma are indicated respectively by ALEPH and the 
semivowels WAU and YE. Word initially, they are preceded by 
ALEPH, or in the case of ALEPH itself, by an ALEPH with -a 
tilde shaped madde above. 


Adaptation of Arabic Script Oct 87 11 


Modifications of the vowel system are numerous. Pashtu 
writes the schwa with a small raised bar or ALEPH and an 
inverted pesh (damma). Kashmiri uses the hamza marker above 
and below consonants. Philipine Arabic alphabets have 
dotted the three basic short vowel marks for a total of six. 
The dots were placed above the upper diacritics and below 
the lower ones, presumeably to separate them from 
consonantal dots. 


Long vowel markings have also been modified. Kurdish and 
Kashmiri both use the V-shaped sukkun above the long vowel 
letters YE and WAU. 


Most languages follow the Arabic tendency to omit short 
vowel diacritics except when pronounciation is critical, as 
for proper names, dictionaries, learners' books or holy 
books. But how and whether to write vowels has been a great 
source of controversy in orthography design. 


Pashtu has overcome some of this difficulty by a trend ta 
write all vowels as if long, since length is not as 


significant as vowel quality. Purther, /i/ and /e/ are 
distinguished by whether the two dots below the YE are 
horizontal or vertical respectively, though some 


orthographies only make this distinction word finally. 


Kurdish writes its eight vowels with seven base line 
letters, leaving one unwritten, completely avoiding the 
issue of short vowel diacritics. 


Urdu has 8 vowels and 2 dipthongs (not counting phonemic 
nasalization) and can distinguish them all when vowel 


diacritics are used. But the diacritics are frequently 
missing, leaving only five distinctions word finally and 
four elsewhere. They are sometimes added in ambiguous 
situations. 


Extra graphemic discrimination word finally occurs in many 
Arabic scripts, frequently corresponding to grammatical 
distinctions. 


For example, the normally undotted Arabic consonant HE 
sometimes takes two dots word finally to represent a 
feminine ending. In isolation there is no phonetic 
distinction, but a /t/ sound is inserted in connected speech 
between the feminine ending and the following word. The 
special doubled short vowel marks mentioned above are 
another example in Arabic. 


Similarly, Urdu orthography does not generally distinguish 
between vowel + /n/ and nasalized vowel; both are indicated 
by vowel + NUN. However, word finally this distinction can 
be important grammatically (nasalized vowels indicate the 
plural morpheme) and the nasalized vowel there is marked by 
vowel plus undotted NUN. 


Adaptation of Arabic Script Oct 87 12 


Again in Urdu, word medial /i/ and /e/ are distinguished 
only by vowel diacritics, which often aren't written. But 
word finally this is an important grammatical difference; 
the normal but undotted YE is used for /i/ but /e/ is 
indicated by a YE which curves back underneath the rest of 
the word. These are known as "big YE" and "little YE" 
respectively. 


Pashtu has a special grapheme for the word final /schwati/ 
dipthong, which marks some feminine nouns and the second 
person plural verbal suffix. Interestingly, in some 
orthographies the two affixes are spelt differently -- one 
by YE with a small downward tail at the end of the curve, 
the other by YE with a hamza above it. Other orthographies 
consistantly use one or the other of these graphemes in both 
situations. 

Successful trans dialect orthographies exist in Arabic 
scripts. In Arabic itself, the letter JIM is pronounced 
Mal a /zh/, and Ass in various areas. Several Pashtu 
consonants shift in pronounciation between dialects. Some 
dialects have an extra way of writing /x/, /sh/, /g/, /zh/, 
or (heaven help us) yet another /s/ and /z/, creating some 
spelling problems. But this consonental over 
differentiation combined with a certain amount of vowel 
under differentiation helps cover the phonetic shifts 
between dialects. 


Morpho phonemic spelling also is possible in Arabic scripts. 
Some of the extra word final differentiation discussed above 
could broadly be considered in this category. The Arabic 
definite article is always spelt morpho phonemically ALEPH 
LAM. Preceding certain "lunar" consonants, it is in fact 
pronounced /al/; preceding other “solar" consonants, the LAM 
assimilates to the consonant itself. Another example of 
"morpho graphemics" occurs in the Jawi (Arabic) orthography 
for Malay. Word (and part word?) reduplication is written 
with the Arabic numeral two, though it is completely spelt 
out in Romon orthography. 


PROBLEMS AND PRINCIPLES IN ADAPTING ARABIC SCRIPT 


Potential orthographic controversies abound in developing 
Arabic based scripts, though probably no more than in other 
traditions. 


In some parts of the world, the choice between an Arabic 
based script and other potential scripts is itself 
controversial. Arabic script may be the only practical 
alternative in some areas, but where alternatives exist, 
Many political_and religious factors will enter into the 
decision. In such cases, the use of di-script and bridge 


Adaptation of Arabic Script Oct 87 13 


materials may prove to be the most practical, and could in 
fact deflect much of the controversy. 


Even within the Arabic script family, more than one 
potential model alphabet and script style may exist. 
Language attitudes will cause some people to want their 
language to be identifiably distinct; others may prefer to 
minimize such distinctions. Choosing and promoting a 
standard caligraphic style is another potential source of 
controversy. 


When a language requires additional graphemes, there may be 
alternate choices. Several possibilities may exist for new 
consonants, especially when multiple orthographies exist for 
the same or nearby languages. Consonants representing 
non-native sounds may be retained for loan words, eliminated 
altogether, or even reassigned to new sounds. Creating new 
vowel letters or diacritics may be needed, and a decision 
whether and when to write the diacritics. 


Word division must also be decided. Prepositions and other 
very short words are sometimes connected to an adjacent 
word. Long words with many affixes, on the other hand, may 
be better divided into several ligatures. 


Several principles may help reach acceptable orthography 
decisions. 


First, learn all you can about the orthographic environment. 
Has there been any previous orthographies for the language 
in question? What are the traditions of script modification 
for other local and regional languages? Are there several 
orthographic traditions? How can your script modifications 
fit in with these traditions? 


Second, learn all you can about the sociolinguistic 
environment. What roles do different scripts (e.g., Arabic 
and Roman or Urdu and Sindhi) play in the area? What are 
the attitudes towards these different script families? How 
can you take these attitudes into consideration in 
developing script(s) in your situation? 


Third, learn all you can about the linguistic environment. 
What dialects exist? What is the phonetic inventory? How 
much semantic load do different phonemes carry? What 
special features like morpho phonemics and affixation need 
to be considered in developing the script? How can the 
orthography best represent these features within the 
orthographic traditions? 


Most of these principles apply to any kind of script. But 
with ingenuity in applying them to Arabic based scripts, 
good orthographies can and should be developed-which- fit in 
to the great linguistic and calligraphic traditions of the 
Arabic script world. 


Adaptation of Arabic Script Oct 87 14 


Appendix: Arabic Script Questionaire 


I am very interested in finding out information about other 
Arabic based scripts: primarily the major trade languages, 
and secondarily adaptations made for smaller languages. 
This questionaire may help you focus your thoughts. Even 
photocopied alphabet charts would help greatiy! 

Chas. Meeker %Don Gregson 

Horsleys Green, High Wycombe 

Bucks HP14 3XL ENGLAND 


A. General Information 

1. Your Name, address and date. 
2. Language(s) 

3. Country(ies) of use 

4. Any useful references 


B. Orthographic Factors 
1. Alphabet. List the letters in alphabetical order (is it 
standard?), along with their names, phonetic value(s), 
and the normal transliteration. Show the initial, 
medial and final forms as needed. 
2. Consonants. List all consonant phonemes on a phone 
chart. 
Vowels. List all vowels and dipthongs with their 
phonetic values, transliteration and how written. 
4. Vowel chart. List all vowel phonemes on a phone chart. 
5. Suprasegmentals. List any suprasegmentals (stress, tone, 


ra) 


length, etc). How significant are they in the 
language? Are these indicated in the script? 
Describe. 


6. List any other marks used in the writing system and 
explain their use. 

7. What caligraphic style is usually used in printing? in 
handwriting? typewriting? computer output? Please 
include samples from newspapers, etc. 


C. Linguistic Factors 

1. Are any sounds spelled more than one way? (i.e., causing 
"spelling" problems, like Spanish b/v or English 
ch/tch). Please list, and tell how the different 
spellings are conditioned: environmentally, 
derivationally, historically, lexicaily or other 

2. Are there any regional or other variations in the way the 
script is written? Describe. 

3. Are "non-native" letters retained in spelling borrowed 
words? Describe. 

4. Are there any "foreign" phonemes used only in borrowed 
words? Describe. 

5. Are any letters pronounced in more than one way? (i.e., 
causing "reading" problems, like English 'c' = /k/ or 
/s/. Please list, and tell hw the different 
pronounciations are conditioned: Environmentally, 
derivationally, historically, dialectially, lexically 
or other. 

6. Are any dialect differences successfully covered by the 
script? {e.g., phonetic shifts) Describe. 

7. Are there any difficulties in the script caused by 
regional or other dialects? 

8. Are there any digraphy -- e.g., two graphemes to 


Appendix: Arabic Script Questionaire 


represent one phoneme, like English 'sh' or 'th'? 


Describe : 
9. Are there any "portmanteau" graphemes which represent 
two phonemes, like English 'x' = /ks/? Describe. 


10. Are there any morpho phonemic spellings? Describe and 
give examples. 

11. Are vowels written? Under what circumstances? 

12. Are there differences between oral pronounciation and 
spelling pronounciation? (e.g., English "wanna" vs. 
"want to") Describe and give examples. 

13. Are there differences between spelling pronounciation 
and writing? (e.g., English /kom/ vs. 'comb'). 
Describe and give examples 

14. What rule(s) are used for word division? 

15. Are there any prefixes or suffixes which are part of the 
grammatical or phonetic word but which are separated, 
as separate a orthographic word? Describe. 

i6. Are there any particles or clitics which are separate 
Phonetic or grammatical words, but which are written 
connected, as part of the orthographic word? Describe. 

17. How good is the linguistic "match" between the script 
and the language? 

18. Are there other “mis-matches" besides those listed in 
the above questions? Describe. 

19. Do the "mismatches" hinder (or help) the script's 
usefulness? 

20. Are there any other interesting or unusual things you 
can mention about the script in this language? 


D. Sociolinguistic factors 

1. How strong is the use of script in this language? 

2. Is script used in any mass media? 

3. What is the competition to script use? (illiteracy, 
literacy in another script, literacy in another 
language (same or different script)) 

4. Is another script used for the same or similar language? 
Where and by whom? 

5. What factors influence script choice (religion, country, 
occupation, genre, etc) Describe. 

6. Is script use stable or changing? Describe. 

7. When was this script developed for this language? By 
whom? 

8. What language (script) is it based on? 

9. Does it seem an overt effort was made to make the script 
different from any particular language? Describe. 

How does this relate to language attitudes? 

10. Are there any government or private language 
authorities, academies or orthography boards? 
Describe. 

11. Have there been any major decisions made as to script 
adoption or rejection? When, by whom and for what 
purposes? Describe. 

12. What are the major points of controversy in the 
orthography? Describe here or summarize from the above 
answers. 

13. Is orthography stable or changing? Describe. 


Lurvy 


ADAPTATION OF ARABIC SCRIPT 


The major scripts used for national languages can be divided 
into seven major script families, and each of these can be 
traced back to a particular language. Each script can be 
associated with a particular great cultural tradition, which 
as it spread, spread not only its script but also other 
cultural trappings, notably religion. ‘The spread of Indian 
culture and Buddhist religion and of Arabic culture and 
Islamic religion was quite naturally associated with a 
spread of Devanagri and Arabic scripts respectively, just as 
today the ascendency of certain cultures and ideas has gone 
hand in hand with the spread of Roman and Cyrrilic based 
scripts. 


seses . e a eee Te Be at BT Ter AE er ee Renee 


» Population Language 
» (millions) Basis 


1893 39% Latin Western Christianity 
1212 25% Chinese Confuscian, etc 
962 20% Sanscrit Buddhist 


R 432 % Arabic Muslim 
307 6% Greek Orthodox, Scientific Atheism . 
4 0.9 Amharic Ethiopian Orthodox 
4 0.1 Hebrew Judaism 


World Population by Script 
showing language basis of scrist and 
religion associated with its cultural tradition 


Because of its close relationship with cultural tradition 
choice of orthography and script is a very emotional issue. 
This is particularly noticeable today in the association of 
Islam and Arabic based scripts. With a long history of use 
and the high value placed in many of these cultures on the 
art of calligraphy, it is no wonder that the trend in this 
century toward Roman and Cyrrilic orthographies is viewed as 
a cultural affront. 


We will discuss some of the special characteristics of the 
Arabic script itself, then look at some of the ways in which 
it has been modified for other languages, and finally 
discuss some principles helpful to those who may wish tc 
further adapt Arabic script. I should say that my own 
experience is weak as regards Arabic itself and the 
modifications of the script in Africa and Southeast Asia. 
If you have more experience than I, I would appreciate your 
input either by writing comments or by completing the 
questionaire included in the appendix. 


THE WRITTEN ARABIC LANGUAGE 


The Arabic alphabet has 28 consonants written from right to 


RA antatian Af nunhin On + Aa an 


SS 


The Arabic alphabet 


left and generally connected in a cursive fashion. Half of 
these have dots placed above or below them which distinguish 
them from other consonants with different combinations. of 
dots or with no dots at all. Vowels are indicated by 
Giacritics and by certain ambiguous consonant symbols. 
Several additional marks are used to indicate absence af 
vowel, geminate consonants or other less frequent phenomena. 


Cursive. The letters of a word are generally connected 
together, like English cursive handwriting. Many of the 
letters have a final flourish which is deleted to simplify 


joining to the following letter. Joining the preceding 
letter requires only small changes, usually just the 


addition of a small connecting line. Thus, letters 
generally have four forms. - 


Some letters, called "non-joiners", break the cursive 
pattern; they can be connected only on the right to the 
preceding letter. Thus, a non-joiner has only two forms: 
final and isolate. Non-joiners will break a word into two 
er more connected parts called "ligatures", with the 
non-joiner itself at the end (left) of the ligature. The 
letter following a non-joiner is in initial (or isolate) 
form, because although it is not word initial, it is 
ligature. initial. 


Dots. Many letters share the same basic shape but differ 


Adaptation of Arabic Script Oct 87 2 


ISOLATE FINAL MEDIAL INITIAL F+eMde+tI 
not connected connected connected connected 
connected on left both ways on right together 


GRAIN E CG on ai we Ses 
‘eee 
ie S 


ae os 


Examples of Arabic cursive shapes 


only by combinations of dots placed above or below th 
letter. People generally write the base forms of all the 
letters of a word, and then return to place the appropriate 
number of dots above and below. But these dots are not 
diacritics; they form an integral part of the letter, much 
like the dot on the English i, which in certain cursive 
styles, is the only thing to distinguish it from an e. 


i 


This "dotting" feature has given Arabic script a great deal 
of flexibility which other languages have used to great 
advantage. 


The letters in a "shape group", that is, with the same basic 
shape, have the same four caligraphic forms (initial, 
medial, final and isolate) except for the number and 
placement of dots. 


There are a few oddities, however. FE and QAF have distinct 
isolate and final shapes, having a shallow and deep curve 
respectively. (In North Africa, these letters are written 
differently). But the initial and medial forms are 
distinguished from each other only by dots. Similariy, NUN 
and YE each has its own unique isolate and final shape, -but 
the initial and medial forms are distinguished from the BE 
shape group only by dots. 


I have not seen any modifications of the script which add 
any more such oddities to the script. 


Vowels. Three characters are used to indicate the long 
vowels. Two of these (WAU and YE) can ambiguously represent 
the semivowels; the third (ALEPE) is associated with the 
glottal stop consonant. 


Adaptation of Arabic Script Oct 87 3 


ISOLATE FINAL MEDIAL INITIAL Beh Mec 


a nS i a 


Letters which break the shape group patterns 


Diacritics above or below a consonant indicate when it is 
followed by a short vowel or by no vowel. Although the long 
vowel letters are almost always written, these diacritics 
are used mainly in the Koran, poetry or beginners' texts, to 
insure proper pronounciation. 


This vowel system has proved a challenge for those adapting 
the Arabic script to other languages. 


HISTORY OF ARABIC SCRIPTS AND CALLIGRAPHY 


Of the script families in use today, five of them -- 
Amharic, Arabic, Greek/Cyrrilic, Hebrew and Latin -- are 
distant cousins, tracing their ancestry back to a script in 
use about 1000 BC in Phonecia. This non-cursive script had 
22 consonants and no way of writing vowels. 


By the 8th century BC, certain letters had become ambiguous 
“sometimes vowels", like our English w and y. A century 
later the script was borrowed by the ancestor of Amharic, 
and also by the ancestor of the Greek alphabet. The Greeks 
dropped the consonantal value of the ambiguous letters, but 
the ambiguity was reintroduced in Latin and continued in 
English (u/v and i/j) until recent centuries. 


The ancestor script of modern Hebrew broke off some time 
after Greek and Amharic; this may explain that these are 
both written from right to left. 


The "Arabic" branch of the family did not make the Greek 
vowel innovation, but retained the original ambiguous 
notation. By about the third century AD, varieties of this 
alphabet were used for Syriac, Palmyrene and Nabataean. All 


Adaptation of Arabic Script Oct 87 4 


three developed a system of cursive writing whereby many of 
the letters within a word were connected together. 


With the cursive system came the differentiation of the 
forms depending on position in the word. Further, certain 
letters lost their distinctive shape. Syriac early 
developed the technique of disambiguating such similar 
letters by placing dots over or under them, but Nabatean, 
the ancestor of Arabic, did not attempt to use this device. 


Jazm, the first script of the Quran, had developed from 
Nabatean by the 7th century. This jerky, angular script 
still did not consistantly use the dots necessary to 


differentiate all of the consonants. In fact, this 
convention remained somewhat sporadic even up to medieval 
times! Long vowels were indicated, after the Nabatean 


system, but not consistantly. The word Allah was pronounced 
with a long /a/ in the second syllable, but spelt without an 
ALEPE. Modern spelling also omits the ALEPH, placing 
instead a small raised ALEPE above the LAM. 


The early Jazm secrint was soon superceded for Quranic 
copying by Kufic, another angular script. Varieties of this 
script are still widely used in title pages, carvings anc 
other decorative work, and in the Maghribi (Western) script 
of Morocco and Algeria. 


The need to correctly read the Quran encouraged further 
developments. Not only were consonantal dots and long 
vowels important, but by the 8th century a standard system 
of marking short vowels was adopted, at least for the Quran. 
But the Eastern pronounciation was imposed on the received 
Meccan consonantal orthography, creating anomolies visible 
today, as the spelling of 'Isa with a final YE plus small 
raised ALEPH. 


Naskh caligraphy, with its smooth curves, evolved in the 
ninth century, and developed into the "flat" styles popular 
in the subcontinent and other eastern parts of the Islamic 
world. This and other styles of caligraphy were 
systematized in the tenth century by ibn Muqlah based on 
three units: the dot; the ALEPH, whose height in dots 
characterized each style; and the circle, whose diameter was 
equal to an ALEPH. 


Other calligraphic styles have developed over the centuries. 
Taliq, a "hanging" or "sloping" style, dates from the 
fifteenth century in Persia. Persian calligraphy has 
generally followed this pattern, but the introduction of 
moveable type brought the acceptance of a more flat style. 
Nastaliq (Naskh-Taliq) style is a combination of Naskh and 
Taliq, and is the only style fully acceptable for Urdu. 


Mechanization is much more difficult than with the flat 
styles, so almost all Urdu printing is photo reproduced from 


Adaptation of Arabic Script Oct 87 5 


the work of a calligrapher. Typewriters and most computers 
use a flat style, though one newspaper has an acceptable 
computerized Nastaliq font and a hybrid style has been 
developed using the ED and MS programs developed by JAARS, 
Inc. 


Ruga (or Riqa ?) style dates to the same era as Talig, and 
became widely accepted during the ninteenth century, 
developing in to the handwriting style used fdr Arabic 
today. 


Today one can see a great variety in the use of different 
script styles in the Arabic Script world. Distinctive 
styles are used by different languages and in different 
regions, even in the same text for emphasis as we might use 
italic and roman together. But much more has been done to 
adapt the script to languages which are sometimes quite 
different than the original Arabic. 


Adaptation of Arabic Script Oct 87 6 


ADAPTING ARABIC SCRIPT TO OTHER LANGUAGES 


Arabic script has been adapted to many kinds of languages: 
Semetic, Indo-European, Turkic, Slavic, Malayo Polynesian 
and African, including Persian, Urdu, Pashtu, Sindhi, Malay, 
Turkish, Swahili, Spanish, Hebrew, Berber, Sudanese. This 
has required many kinds of adaptations. 


Arabic 
Swahili en Persian Pike Malay 
Philipine 
Languages 
Kurdish Pashtu grau Sindhi 
Derivation of Arabic Based Scripts 
Some Arabic phonemes are not found in other languages. A 


few languages delete the corresponding "Arabic" letters from 
their alphabet, but most retain them, keeping the alphabet 
of the Quran a subset of each Arabic-based alphabet. The 
"non-native" letters are assigned a phonetically close 
"native" sound, resuiting in over differentiation and 
spelling problems. 


Notice that this leads to multiple pronounciations of some 
Arabic words in English. The month of fasting is generally 
calied “Ramadan" when it is borrowed into English directly 
from Arabic, but in the Subcontinent {and Iran?) the 
accepted Anglicization is "Ramazan". 


Outside the Arabic speaking world, those who know Arabic 
will sometimes try to differentiate between the different 
Arabic sounds, (with varying degrees of success), but these 
letters are generally homophonous. 


In each of these sound groups, one letter is consistantly 
used for native words, leaving the other "Arabic" consonants 
exclusively for Arabic loan words, which generally retain 
the original spelling. Old Persian, however, shared a /dh/ 
fricative with Arabic, and DHAL (ZAL) was used to indicte 
this sound in native words. This spelling continues today, 
though the phontetic distinction no longer exists in modern 
Persian. 


Most languages retain the original spelling of words 
borrowed from Arabic, occasionally replacing "Arabic" 
consonants with "native" ones. Sometimes there is a desire 
to exclude all "non-native" consonants, and this is a point 
of orthographic controversy in some languages. In languages 


Adaptation of Arabic Script Oct 87 kà 


{t/ Te" =, ror b 


/s/ SIN ( » SE — SWAD (jy? 
(3 


/z/ ZE J% ZAL >  ZWAD 


1 b 


N 


/y/ mame) QAF E) (Iranian Persian) 
Pa v 
/K/ KAF +j] QAF c3 (Most others} 


Homophonous letters 
in Persian and daughter alphabets 


retaining "Arabic" consonants, they are generally introduced 
in primers in later lessons. Since Arabic loanwords tend to 
be higher level vocabulary, finding appropriate words for 
picture A-B-C books is often problematic. 


These "Arabic" letters are not generally reassigned to 
represent non-Arabic sounds, instead new letters are 
invented. This leads to large alphabets -- though Arabic 
has only 28, Malay has 31, Persian 32, Urdu 37 (or 47 
Sepending on how you count}, Pashtu 42 anc Sindhi 52! 


New consonants are easily created by adding dots or other 
marks to existing letters. The base form is usually 
selected because of a sound similar to the "new" one, anda 
strong tendency can be observed to perserve or even create 
sound-symbol relationships. 


Thus, Persian requires a /p/ and /ch/ sounds which Arabic 
doesn't have, and it uses the base form of BE and JIM to 
create new letters, replacing the single dot below with 
three dots. So, "three dots below", a combination not 
present in Arabic, seems to mean "devoicing". 


Likewise, a /zh/ sound is not present in Arabic, but is 
added in Persian by the analogy s:sh :: z:zh. Thus, since 
SHIN adds 3 dots on the SIN shape, ZHE is created by placing 
3 dots above the ZE shape (or more accurately, above the RE 
shape, which is the base form of ZE). Pashtu continues this 
analogy by representing the retroflexed fricatives with the 
same base shapes with two dots, one above and one below. 


Arabic has five dot combinations: one dot, above or below 
the base line character; two horizontal dots, above or 
below; and three dots above, arranged in an upward pointing 
triangle. Mentioned above are two additional combinations: 


Adaptation of Arabic Script Oct 87 8 


Voiced Stops: /ā3/ JIM /b/ BE SY 
(Ar,Pe,Pa)(° (Ar, Pe, Pa) 
Unvoiced Stops: /tf/ CHE /p/ PE o 
(Pe, Pa) ky (Pe, Pa) sii 
Fricatives: /2Z/ ZE Š /s/ SIN 
(Ar,Pe,Pa) 7 (Ar, Pe,Pa) J” 
Palatalized: /3/ ZuE “ /Í/ SHIN 5 
(Pe,FPa) 2 (Ar,Pe,Pa) Coe 
Retroflexed: /z.9/ ZHE /8.%/ SHIN » 
(Ba) 4 (Pa) J” 
Consonant Innovations from Arabic (Ar) 
in Persian (Pe) and Pashtu (Pa) 
Arabic basis =S : = Š 


Persian additions — 


Pashtu additions — — zo 

Urdu additions = = 

Sindhi additions = y ç + ~> = 
Other languages — eae ae Piae 


Dot combinations and other symbols 
used on consonants in various languages 


three dots below, arranged in a downward pointing triangle, 
and two dots, one above and one below, used in Persian and 
Pashtu respectively. Pashtu also adds two vertical dots 
below, distinguished from the normal horizontal placement. 


Sindhi distinguishes horizontal and vertical two dots both 
above and below the base character. Above the line, Sindhi 
distinguishes two. kinds of three dot combinations, one 
upward pointing and the other downward pointing, though 
below the line it only uses three downward pointing dots. 
With the addition of four dots above and four dots below, 
Sindhi has the greatest consonantal variety of any of the 
Arabic based scripts I have seen. 


Generally, these dots are added to one of the base shapes 
which already had dots, but Sindhi adds dots to the GAF 
shape, a derivitive of the KAF shape, which did not use dots 
in Arabic. 


Aes nan a 


The number 4 has been used in some minority languages to 
represent four dots. Caligraphically, two dots can be 
replaced by a single bar, three dots by a circle, and four 
dots by parallel lines. 


Other marks besides dots are used to distinguish letters. 
Again, the base shape is usually selected to be phonetically 
similar to the "new" letter. 


Urdu originally used 4 dots above to indicate retroflexed 
consonants, but these came to be replaced by a small raised 
TOL letter. Pashtu writes these same sounds attaching an 
ear-shaped hoop called gharwandde to the base of the letter. 
Chitrali uses the TOI in conjunction with dots to indicate 
retroflexed fricatives and affricates. 


Vowel diacritics are sometimes incorporated with consonantal 
shapes to create new letters. The sukkun or no-vowel mark 
is sometimes written as a V or as an inverted V. Kurdish 
uses the former above a LAM for a dark /l/; Kashmiri uses 
the latter above NUN and RE and Parkari above DAL and RE. 
Shina uses the circular sukkun below an undotted JIM and 
above an undotted NUN, and two such marks below an undotted 
JIM and above a SIN. The Pashtu /dz/ affricate is written 
with an undotted JIM shape topped with a hamza, which isn't 
actually a vowel mark in Arabic, but is used elsewhere to 
indicate vowel glides or a glottal stop. 


Similarly, the Persian writes /g/ as a double barred KAF 
shape, which looks quite like KAF with a fatha (zabar) above 
it. This requires a further change: writing the bar(s) on 
both word finally, a convention not generally followed in 
Arabic. GAF was, however, a relatively late innovation; 
late nineteenth and early twentieth century Persian books 
don't distinguish KAF and GAF. This apparently explains why 
Pashtu, which follows every other Persian innovation, 
indicates /g/ differently, as a KAF with a gharwandde (ear) 
hanging from the bar or base. 


The base form may be changed by a new script. The "changed" 
form may actually be an innovation, or it may be an adoption 
of one particular style over another. In Iran and South 
Asia, the isolate and final YE frequently has no dots, 
whereas in Arabic it usually does. Similarly, Sindhi uses 
the short tailed MIM exclusively, and chooses a particular 
form of the JIM shape group (s) to the exclusion of others 
( g or sz). 


Sometimes two alternate forms are both adapted into a 
"daughter" script, but with different meanings. The 
horizontal and vertical two dot combinations could be 
regarded in this way. Another example is Urdu, where the 
‘hook" form of HE indicates an independent consonant, 
whereas the "butterfly" form indicates aspiration of the 
previous consonant. Similarly, Sindhi distinguishes between 


Adaptation of Arabic Script Oct 87 10 


Dental Stops: TE 2 DAL 5 RE » NUN ¢' 
(Ar,Ur,Pa) /t/ 7a/ /r/ Te 

Retroflexed: TTE,Ż, DDAL S RRE 
(Urdu) 7t/ /d/ iri 7 

Retroflexed: TTE ©, DDAL RRE NNUN ,- > 
(Pashtu) 7t/ 7a 2 7r/ 2 m O r] 


Retroflexed Consonants in Urdu and Pashtu 
the upright and folded KAF shape (J vs. © ), indicating 
presence and absence of aspiration respectively. 


Digraphs have also increased the repetoire of symbols in 
Arabic. Urdu puts the "butterfly" form of the HE after 
eleven different stops to indicate aspiration of that stop. 
Pakistani Pashtu indicates retroflexed /n/ as a dental NUN 
plus retrofleued RRE, which in turn is a regular RE with the 
gharwandde (ear) on it. (Afghan Pashtu uses a separate 
grapheme, dental NUN with the ghaerwandde). Kurdish has a 
trilled /F/ word initially and a flap vs. trill contrast 
elsewhere; like Spanish it uses a single RE word initially 
and for the flap, and a double RE for non-initial triiled 
/F/. Single and double WAU provide contrast between two 
high back vowels. 


The Arabic yowel system is quite simple: three short vowels, 
indicated by diacritics above or below the preceding 
consonant, and three long vowels, indicated by full letters. 


Arabic has three short vowel diacritics: fatha, a "northeast 
to southwest" accent above the letter; kasra, similarly 
shaped but below the letter; and damma, a small raised coma 
or WAU letter shape above the letter. In Persian and South 


Asian nomenclature, these are zabar, zer and pesh 
respectively. They are placed on the consonant which they 


follow, and word initially on an aleph. A sukkun, shaped 
like a small circle, half circle, V or upside down V, 
indicates absence of vowel. 


Distinctive varieties of the short vowel marks are used to 
indicate certain Arabic grammatical endings, which are 
realized as vowel plus /n/ in connected speech. These 
symbols are formed by doubling the vowel diacritics. 


Long vowels are indicated by baseline characters, which 
technically should be considered prolongation marks rather 
than actual vowel letters. The long form of the fatha, 
kasra and damma are indicated respectively by ALEPH and the 
semivowels WAU and YE. Word initially, they are preceded by 
ALEPH, or in the case of ALEPH itself, by-an ALEPH with a 
tilde shaped madde above. 


Adaptation of Arabic Script Oct 87 11 


Modifications of the vowel system are numerous. Pashtu 
writes the schwa with ea small raised bar or ALEPH and an 
inverted pesh (damma). Kashmiri uses the hamza marker above 
and below consonants. Philipine Arabic alphabets have 
dotted the three basic short vowel marks for a total of six. 
The dots were placed above the upper diacritics and below 
the lower ones, presumeably to separate them from 
consonantal dots. 


Long vowel markings have also been modified. Kurdish and 
Kashmiri both use the V-shaped sukkun above the long vowel 
letters YE and WAU. 


Most languages follow the Arabic tendency to omit short 
vowel diacritics except when pronounciation is critical, as 
for proper names, dictionaries, learners' books or holy 
books. But how and whether to write vowels has been a great 
source of controversy in orthography design. 


Pashtu has overcome some of this difficulty by a trend to 
write all vowels as if long, since length is not as 


significant as vowel quality. Further, /if and /e/ are 
distinguished by whether the two dots below the YE are 
horizontal or vertical respectively, though some 


orthographies only make this distinction word finally. 


Kurdish writes its eight vowels with seven base line 
letters, leaving one unwritten, completely avoiding the 
issue of short vowel diacritics. 


Urdu has 8 vowels and 2 dipthongs (not counting phonemic 
nasalization) and can distinguish them all when vowel 


diacritics are used. But the diacritics are frequently 
missing, leaving only five distinctions word finally and 
four elsewhere. They are sometimes added in ambiguous 


situations. 


Extra graphemic discrimination word finally occurs in many 
Arabic scripts, frequently corresponding to grammatical 
distinctions. 


For example, the normally undotted Arabic consonant HE 
sometimes takes two dots word finally to represent a 
feminine ending. In isolation there is no phonetic 
distinction, but a /t/ sound is inserted in connected speech 
between the feminine ending and the following word. The 
special doubled short vowel marks mentioned above are 
another example in Arabic. 


Similarly, Urdu orthography does not generally distinguish 
between vowel + /n/ and nasalized vowel; both are indicated 
by vowel + NUN. However, word finally this distinction can 
be important grammatically (nasalized vowels indicate the 
plural morpheme) and the nasalized vowel there is marked by 
vowel plus undotted NUN. 


Adantation of Arabic Script Oct 87 12 


Again in Urdu, word medial /i/ and /e/ are distinguished 
only by vowel diacritics, which often aren't written. But 
word finally this is an important grammatical difference; 
the normal but undotted YE is used for /i/ but /e/ is 
indicated by a YE which curves back underneath the rest of 
the word. These are known as "big YE" and "little YE" 
respectively. 


Pashtu has a special grapheme for the word final /schwat+i/ 
dipthong, which marks some feminine nouns and the second 
person plural verbal suffix. Interestingly, in some 
orthographies the two affixes are spelt differently -- one 
by YE with a small downward tail at the end of the curve, 
the other by YE with a hamza above it. Other orthographies 
consistantly use one or the other of these graphemes in both 
situations. 


Successful trans dialect orthographies exist in Arabic 
scripts. In Arabic itself, the letter JIM is pronounced 


Ma zh/, and /g/ in various areas. Several Pashtu 
consonants shift in pronounciation between dialects. Some 
dialects have an extra way of writing /x/, /sh/. /g/. /zh/, 
or (heaver help us) yet another /s/ and /z/, creating some 
spelling problems. But this consonental over 


differentiation combinec with a certain amount of vowel 
under differentiation helps cover the phonetic shifts 
between dialects. 


Morpho phonemic spelling also is possible in Arabic scripts. 
Some of the extra word final differentiation discussed above 
could broadly be considered in this category. The Arabic 
definite article is always spelt morpho phonemically ALEPH 
LAM. Preceding certain "lunar" consonants, it is in fact 
pronounced /al/; preceding other "solar" consonants, the LAM 
assimilates to the consonant itself. Another example of 
“morpho graphemics" occurs in the Jawi (Arabic) orthography 
for Malay. Word (and part word?) reduplication is written 
with the Arabic numeral two, though it is completely spelt 
out in Romon orthography. 


PROBLEMS AND PRINCIPLES IN ADAPTING ARABIC SCRIPT 


Potential orthographic controversies abound in developing 
Arabic based scripts, though probably no more than in other 
traditions. 


In some parts of the world, the choice between an Arabic 
based script and other potential scripts is itself 
controversial. Arabic script may be the only practical 
alternative in some areas, but where alternatives exist, 
many political_and_religious factors will enter into the 
decision. In such cases, the use of di~script and bridge 


Adaptation of Arabic Script Oct 87 13 


materials may prove to be the most practical, and could in 
fact deflect much of the controversy. 


Even within the Arabic script family, more than one 
potential model alphabet and script style may exist. 
Language attitudes will cause some people to want their 
language to be identifiably distinct; others may prefer to 
minimize such distinctions. Choosing and promoting a 
standard caligraphic style is another potential source of 
controversy. 


When a language requires additional graphemes, there may be 
alternate choices. Several possibilities may exist for new 
consonants, especially when multiple orthographies exist for 
the same or nearby languages. Consonants representing 
non-native sounds may be retained for loan words, eliminated 
altogether, or even reassigned to new sounds. Creating new 
vowel letters or diacritics may be needed, and a decision 
whether and when to write the diacritics. 


Word division must also be decided. Prepositions and other 
very short words are sometimes connected to an adjacent 
word. Long words with many affixes, on the other hand, may 
be better divided into several ligatures. 


Several principles may help reach acceptable orthography 
decisions. 


First, learn all you can about the orthographic environment. 
Has there been any previous orthographies for the language 
in question? What are the traditions of script modification 
for other local and regional languages? Are there several 
orthographic traditions? How can your script modifications 
fit in with these traditions? 


Second, learn all you can about the sociolinguistic 
environment. What roles do different scripts (e.g., Arabic 
and Roman or Urdu and Sindhi) play in the area? What are 
the attitudes towards these different script families? How 
can you take these attitudes into consideration in 
developing script(s) in your situation? 


Third, learn all you can about the linguistic environment. 
What dialects exist? What is the phonetic inventory? How 
much semantic load do different phonemes carry? What 
special features like morpho phonemics and affixation need 
to be considered in developing the script? How can the 
orthography best represent these features within the 
orthographic traditions? 


Most of these principles apply to any kind of script. But 
with ingenuity in applying them to Arabic based scripts, 
good orthographies can and should be developed which fit -in 
to the great linguistic and calligraphic traditions of the 


Arabic script world. 


Adaptation of Arabic Script Oct 87 14 


Appendix: Arabic Script Questionaire 


I am very interested in finding out information about other 
Arabic based scripts: primarily the major trade languages, 
and secondarily adaptations made for smaller languages. 
This questionaire may help you focus your thoughts. Even 
photocopied alphabet charts would help greatly! 

Chas. Meeker %Don Gregson 

Horsileys Green, High Wycombe 

Bucks HP14 3XL ENGLAND 


- General Information 

Your Name, address and date. 
Language(s) 

Country(ies) of use 

. Any useful references 


BONE pw 


B. Orthographic Factors 

1. Alphabet. List the letters in alphabetical order (is it 
standard?), along with their names, phonetic value(s) 
and the normal transliteration. Show the initial, 
medial and final forms as needed. 

2. Consonants. List all consonant phonemes on a phone 
chart. 

3. Vowels. List all vowels and dipthongs with their 
phonetic values, transliteration and how written. 

4. Vowel chart. List all vowel phonemes on a phone chart. 

5. Suprasegmentals. List any suprasegmentalis (stress, tone, 
length, etc). How significant are they in the 
language? Are these indicated in the script? 
Describe. 

6. List any other marks used in the writing system and 
explain their use. 

7. What calligraphic style is usually used in printing? in 
handwriting? typewriting? computer output? -Please 
include samples from newspapers, etc. 


C. Linguistic Factors 

1. Are any sounds spelled more than one way? (i.e., causing 
"spelling" problems, like Spanish b/v or English 
ch/tch). Please list, and tell how the different 
spellings are conditioned: environmentally, 
derivationally, historically, lexically or other 

2. Are there any regional or other variations in the way the 
script is written? Describe. 

3. Are "non-native" letters retained in spelling borrowed 
words? Describe. 

4. Are there any "foreign" phonemes used only in borrowed 
words? Describe. 

5. Are any letters pronounced in more than one way? (i.e. 
causing "reading" problems, like English 'c' = /k/ or 
/s/. Please list, and tell hw the different 
pronounciations are conditioned: Environmentally, 
derivationally, historically, dialectially, lexically 
or other. 

6. Are any dialect differences successfully covered by the 
script? (e.g., phonetic shifts) Describe. 

7. Are there any difficulties in the script caused by 
regional or other dialects? 

8. Are there any digraphy -- e.g., two graphemes to 


Appendix: Arabic Script Questionaire 


represent one phoneme, like English 'sh' or 'th'? 


Describe 
9. Are there any "portmanteau" graphemes which represent 
two phonemes, like English 'x' = /ks/? Describe. 


10. Are there any morpho phonemic spellings? Describe and 
give examples. 

11. Are vowels written? Under what circumstances? 

12. Are there differences between oral pronounciation and 
spelling pronounciation? (e.g., English "wanna" vs. 
“want to") Describe and give examples. 

13. Are there differences between spelling pronounciation 
and writing? (e.g., English /kom/ vs. 'comb'). 
Describe and give examples 

14. What rule(s) are used for word division? 

15. Are there any prefixes or suffixes which are part of the 
grammatical or phonetic word but which are separated, 
as separate a orthographic word? Describe. 

16. Are there any particles or clitics which are separate 
phonetic or grammatical words, but which are written 
connected, as part of the orthographic word? Describe. 

17. How good is the linguistic "match" between the script 
and the language? 

18. Are there other "mis-matches" besides those listed in 
the above questions? Describe. 

ig. Do the "mismatches" hinder (or help) the script's 
usefulness? 

20. Are there any other interesting or unusual things you 
can mention about the script in this language? 


D. Sociolinguistic factors 

1. How strong is the use of script in this language? 

2. Is script used in any mass media? 

3. What is the competition to script use? (illiteracy, 
literacy in another script, literacy in another 
language (same or different script)) 

4. Is another script used for the same or similar language? 
Where and by whom? 

5. What factors influence script choice (religion, country, 
occupation, genre, etc) Describe. 

6. Is script use stable or changing? Describe. 

7. When was this script developed for this language? By 
whom? 

8. What language (script) is it based on? 

9. Does it seem an overt effort was made to make the script 
different from any particular language? Describe. 

How does this relate to language attitudes? 

10. Are there any government or private language 
authorities, academies or orthography boards? 
Describe. 

11. Have there been any major decisions made as to script 
adoption or rejection? When, by whom and for what 
purposes? Describe. 

12. What are the major points of controversy in the 
orthography? Describe here or summarize from the above 
answers. 

18. Is orthography stable or changing? Describe. 


